Data Visualization

Load the tidyverse and ggplot2 package library

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 2.2.1     ✔ purrr   0.2.4
## ✔ tibble  1.4.2     ✔ dplyr   0.7.4
## ✔ tidyr   0.8.0     ✔ stringr 1.3.0
## ✔ readr   1.1.1     ✔ forcats 0.3.0
## Warning: package 'tibble' was built under R version 3.4.3
## Warning: package 'stringr' was built under R version 3.4.3
## Warning: package 'forcats' was built under R version 3.4.3
## ── Conflicts ────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(ggplot2)
library(maps)
## 
## Attaching package: 'maps'
## The following object is masked from 'package:purrr':
## 
##     map
library(mapproj)

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

A quick summary of dataset

summary(mpg)
##  manufacturer          model               displ            year     
##  Length:234         Length:234         Min.   :1.600   Min.   :1999  
##  Class :character   Class :character   1st Qu.:2.400   1st Qu.:1999  
##  Mode  :character   Mode  :character   Median :3.300   Median :2004  
##                                        Mean   :3.472   Mean   :2004  
##                                        3rd Qu.:4.600   3rd Qu.:2008  
##                                        Max.   :7.000   Max.   :2008  
##       cyl           trans               drv                 cty       
##  Min.   :4.000   Length:234         Length:234         Min.   : 9.00  
##  1st Qu.:4.000   Class :character   Class :character   1st Qu.:14.00  
##  Median :6.000   Mode  :character   Mode  :character   Median :17.00  
##  Mean   :5.889                                         Mean   :16.86  
##  3rd Qu.:8.000                                         3rd Qu.:19.00  
##  Max.   :8.000                                         Max.   :35.00  
##       hwy             fl               class          
##  Min.   :12.00   Length:234         Length:234        
##  1st Qu.:18.00   Class :character   Class :character  
##  Median :24.00   Mode  :character   Mode  :character  
##  Mean   :23.44                                        
##  3rd Qu.:27.00                                        
##  Max.   :44.00

To Learn more about mpg Dataset

?mpg

3.2 First Steps:

Do Cars with big engines use more fuel than cars with small engines?

3.2.1 The mpg Data Frame

ggplot2::mpg
## # A tibble: 234 x 11
##    manufacturer model    displ  year   cyl trans   drv     cty   hwy fl   
##    <chr>        <chr>    <dbl> <int> <int> <chr>   <chr> <int> <int> <chr>
##  1 audi         a4        1.80  1999     4 auto(l… f        18    29 p    
##  2 audi         a4        1.80  1999     4 manual… f        21    29 p    
##  3 audi         a4        2.00  2008     4 manual… f        20    31 p    
##  4 audi         a4        2.00  2008     4 auto(a… f        21    30 p    
##  5 audi         a4        2.80  1999     6 auto(l… f        16    26 p    
##  6 audi         a4        2.80  1999     6 manual… f        18    26 p    
##  7 audi         a4        3.10  2008     6 auto(a… f        18    27 p    
##  8 audi         a4 quat…  1.80  1999     4 manual… 4        18    26 p    
##  9 audi         a4 quat…  1.80  1999     4 auto(l… 4        16    25 p    
## 10 audi         a4 quat…  2.00  2008     4 manual… 4        20    28 p    
## # ... with 224 more rows, and 1 more variable: class <chr>

3.2.2 Creating a ggplot

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy))

The function geom_point() adds a layer of points to plot, which creates a scatterplot

3.2.3 A graphing template

3.2.4 Exercise

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy))

1. I see a scatter plot graph

  1. there are 234 rows in mpg dataset

  2. drv stands for the type of ‘wheel drive’ of the vehicle. Ex: f = front-wheel

  3. Below is the Scater Plot for hwy (y-axis) vs cyl(x-axis)

5 A Scater Plot for class (y-axis) vs drv (x-axis)

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = cyl, y = hwy))

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = class, y = drv))

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = drv, y = class))

3.3 Aesthetic mappings

Consider Outliers - Postulate that they are hybrid cars hypothesis - the cars are hybrid

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy))

You can map the colors of your points to the class variable to reveal the class of each car.

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy, color = class))

Colours reveal that many of the unsusal points are 2-seater cars. They are not hybrids but sportscars.Sports cars have large engines like SUVs and pickup trucks, but small bodies like midsize and compact cars, which improves their gas mileage. In hindsight, these cars were unlikely to be hybrids since they have large engines.

Consider Mapping Class to the size aesthetic

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy, size = class))
## Warning: Using size for a discrete variable is not advised.

Let’s map size class to the alpha aesthetic, as it controls the transparency/shape of the points

Note: For shape aesthetic, ggplot2 only uses 6 shapes at a time.

#Left
ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy, alpha = class))

#Right
ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy, shape = class))
## Warning: The shape palette can deal with a maximum of 6 discrete values
## because more than 6 becomes difficult to discriminate; you have 7.
## Consider specifying shapes manually if you must have them.
## Warning: Removed 62 rows containing missing values (geom_point).

The Aesthetic function aes()

For each aesthetic, you use aes() to associate the name of the aesthetic with a variable to display. The aes() function gathers together each of the aesthetic mappings used by a layer and passes them to the layer’s mapping argument.

The aesthetic properties of the geom can be set manually. For example, we can make all of the points in our plot blue:

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy), color = "blue")

To set an aesthetic manually, set the aesthetic by name as an argument of your geom function; i.e. it goes outside of aes(). You’ll need to pick a value that makes sense for that aesthetic:

The name of a color as a character string. The size of a point in mm. *The shape of a point as a number

3.3.1 Exercises

  1. What’s gone wrong with this code? Why are the points not blue?

Answer: The colour “blue” is a manual esthetic, hence it needs to be outside of the aes() function

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy, color = "blue"))

  1. Which variables in mpg are categorical? Which variables are continuous? (Hint: type ?mpg to read the documentation for the dataset). How can you see this information when you run mpg? Answer:
  • Categorical: manufacturer, model, cylinder, trans, drv, fl, class
  • Continuous: cty, hwy
mpg
## # A tibble: 234 x 11
##    manufacturer model    displ  year   cyl trans   drv     cty   hwy fl   
##    <chr>        <chr>    <dbl> <int> <int> <chr>   <chr> <int> <int> <chr>
##  1 audi         a4        1.80  1999     4 auto(l… f        18    29 p    
##  2 audi         a4        1.80  1999     4 manual… f        21    29 p    
##  3 audi         a4        2.00  2008     4 manual… f        20    31 p    
##  4 audi         a4        2.00  2008     4 auto(a… f        21    30 p    
##  5 audi         a4        2.80  1999     6 auto(l… f        16    26 p    
##  6 audi         a4        2.80  1999     6 manual… f        18    26 p    
##  7 audi         a4        3.10  2008     6 auto(a… f        18    27 p    
##  8 audi         a4 quat…  1.80  1999     4 manual… 4        18    26 p    
##  9 audi         a4 quat…  1.80  1999     4 auto(l… 4        16    25 p    
## 10 audi         a4 quat…  2.00  2008     4 manual… 4        20    28 p    
## # ... with 224 more rows, and 1 more variable: class <chr>

3.4 Common Problems

  • Make sure that every ( is matched with a ) and every " is paired with another "
  • Sometimes you’ll run the code and nothing happens. Check the left-hand of your console: if it’s a +, it means that R doesn’t think you’ve typed a complete expression and it’s waiting for you to finish it. In this case, it’s usually easy to start from scratch again by pressing ESCAPE to abort processing the current command
  • One common problem when creating ggplot2 graphics is to put the + in the wrong place: it has to come at the end of the line, not the start (see below)
ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy))

  • If you’re still stuck, try the help. You can get help about any R function by running ?function_name in the console, or selecting the function name and pressing F1 in RStudio. Don’t worry if the help doesn’t seem that helpful - instead skip down to the examples and look for code that matches what you’re trying to do.
  • Try googling the error message, as it’s likely someone else has had the same problem, and has gotten help online.

3.5 Facets

Another way to add additional variable, particularly useful for categorical variables, is to split your plot into facets, subplots that each display one subset of the data.

  • use facet_wrap()
ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) + 
  facet_wrap(~ class, nrow = 2)

  • To facet your plot on the combination of two variables, add facet_grid() to your plot call. The first argument of facet_grid() is also a formula
  • If you prefer to not facet in the rows or columns dimension, use a . instead of a variable name, e.g. + facet_grid(. ~ cyl)
ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) + 
  facet_grid(drv ~ cyl)

3.6 Geometric objects

Each plot uses a different visual object to represent the data. In ggplot2 syntax, we say that they use different geoms.

A geom is the geometrical object that a plot uses to represent data. People often describe plots by the type of geom that the plot uses. For example, bar charts use bar geoms, line charts use line geoms, boxplots use boxplot geoms

# left
ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy))

# right
ggplot(data = mpg) + 
  geom_smooth(mapping = aes(x = displ, y = hwy))
## `geom_smooth()` using method = 'loess'

  • Not every aesthetic works with every geom.
  • You could set the shape of a point, but you couldn’t set the “shape” of a line
  • In the next section, we will learn how to place multiple geoms in the same plot.
ggplot(data = mpg) + 
  geom_smooth(mapping = aes(x = displ, y = hwy, linetype = drv))
## `geom_smooth()` using method = 'loess'

ggplot(data = mpg) +
  geom_smooth(mapping = aes(x = displ, y = hwy))
## `geom_smooth()` using method = 'loess'

ggplot(data = mpg) +
  geom_smooth(mapping = aes(x = displ, y = hwy, group = drv))
## `geom_smooth()` using method = 'loess'

ggplot(data = mpg) +
  geom_smooth(
    mapping = aes(x = displ, y = hwy, color = drv),
    show.legend = FALSE
  )
## `geom_smooth()` using method = 'loess'

  • To display multiple geoms in the same plot, add multiple geom functions to ggplot():
ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) +
  geom_smooth(mapping = aes(x = displ, y = hwy))
## `geom_smooth()` using method = 'loess'

This, however, introduces some duplication in our code. Imagine if you wanted to change the y-axis to display cty instead of hwy. You’d need to change the variable in two places, and you might forget to update one. You can avoid this type of repetition by passing a set of mappings to ggplot()

ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point() + 
  geom_smooth()
## `geom_smooth()` using method = 'loess'

If you place mappings in a geom function, ggplot2 will treat them as local mappings for the layer. It will use these mappings to extend or overwrite the global mappings for that layer only.

ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point(mapping = aes(color = class)) + 
  geom_smooth()
## `geom_smooth()` using method = 'loess'

You can use the same idea to specify different data for each layer.

ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point(mapping = aes(color = class)) + 
  geom_smooth(data = filter(mpg, class == "subcompact"), se = FALSE)
## `geom_smooth()` using method = 'loess'

3.7 Statistical transformations

Next, let’s take a look at a bar chart. Bar charts seem simple, but they are interesting because they reveal something subtle about plots. Consider a basic bar chart, as drawn with geom_bar(). The following chart displays the total number of diamonds in the diamonds dataset, grouped by cut.

A quick summary of dataset

summary(diamonds)
##      carat               cut        color        clarity     
##  Min.   :0.2000   Fair     : 1610   D: 6775   SI1    :13065  
##  1st Qu.:0.4000   Good     : 4906   E: 9797   VS2    :12258  
##  Median :0.7000   Very Good:12082   F: 9542   SI2    : 9194  
##  Mean   :0.7979   Premium  :13791   G:11292   VS1    : 8171  
##  3rd Qu.:1.0400   Ideal    :21551   H: 8304   VVS2   : 5066  
##  Max.   :5.0100                     I: 5422   VVS1   : 3655  
##                                     J: 2808   (Other): 2531  
##      depth           table           price             x         
##  Min.   :43.00   Min.   :43.00   Min.   :  326   Min.   : 0.000  
##  1st Qu.:61.00   1st Qu.:56.00   1st Qu.:  950   1st Qu.: 4.710  
##  Median :61.80   Median :57.00   Median : 2401   Median : 5.700  
##  Mean   :61.75   Mean   :57.46   Mean   : 3933   Mean   : 5.731  
##  3rd Qu.:62.50   3rd Qu.:59.00   3rd Qu.: 5324   3rd Qu.: 6.540  
##  Max.   :79.00   Max.   :95.00   Max.   :18823   Max.   :10.740  
##                                                                  
##        y                z         
##  Min.   : 0.000   Min.   : 0.000  
##  1st Qu.: 4.720   1st Qu.: 2.910  
##  Median : 5.710   Median : 3.530  
##  Mean   : 5.735   Mean   : 3.539  
##  3rd Qu.: 6.540   3rd Qu.: 4.040  
##  Max.   :58.900   Max.   :31.800  
## 

A barchart of the diamonds dataset

ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut))

The y axis seems to display a ‘count’ instead of the ususal variable from the dataset. This is because many graphs, like scatterplots, plot the raw values of your dataset. Other graphs, like bar charts, calculate new values to plot

  • bar charts, histograms, and frequency polygons bin your data and then plot bin counts, the number of points that fall in each bin.
  • smoothers fit a model to your data and then plot predictions from the model.
  • boxplots compute a robust summary of the distribution and then display a specially formatted box

Stat: also known as statistical transformation is the The algorithm used to calculate new values for a graph

?geom_bar

Note: you can generally use geoms and stats interchangeably. For example, you can recreate the previous plot using stat_count() instead of geom_bar():

ggplot(data = diamonds) +
  stat_count(mapping = aes(x = cut))

Use stat_summary(), which summarises the y values for each unique x value, to draw attention to the summary that you’re computing

ggplot(data = diamonds) + 
  stat_summary(
    mapping = aes(x = cut, y = depth),
    fun.ymin = min,
    fun.ymax = max,
    fun.y = median
  )

ggplot2 provides over 20 stats for you to use. Each stat is a function, so you can get help in the usual way, e.g. ?stat_bin. To see a complete list of stats, try the ggplot2 cheatsheet

?stat_bin

3.7.1 Exercises

    1. What is the default geom associated with stat_summary()? How could you rewrite the previous plot to use that geom function instead of the stat function?

Answer: The default geom for stat_summary() is geom_pointrange. For geom_pointrange, the default stat is “identity”, so in order to duplicate the previous plot we need to change the stat to summary and change the min, max and midpoint to reflect the same parameters as previously.

ggplot(data = diamonds) + 
  geom_pointrange(
    mapping = aes(x = cut, y = depth),
    stat = "summary",
    fun.ymin = min,
    fun.ymax = max,
    fun.y = median
  )

    1. What does geom_col() do? How is it different to geom_bar()?

Answer: Geom_bar makes the height of the bar proportional to the number of cases in each group (or if the weight aesthetic is supplied, the sum of the weights). If you want the heights of the bars to represent values in the data, use geom_col instead.

ggplot(data = diamonds) + 
  geom_col(mapping = aes(x = cut, y = depth))

    1. Most geoms and stats come in pairs that are almost always used in concert. Read through the documentation and make a list of all the pairs. What do they have in common?

Answer: Here is the [ggplot2 reference link] (http://ggplot2.tidyverse.org/reference/)

    1. What variables does stat_smooth() compute? What parameters control its behaviour?

Answer: stat_smooth calculates:

  • y: predicted value
  • ymin: lower value of the confidence interval
  • ymax: upper value of the confidence interval
  • se: standard error

There’s parameters such as method which determines which method is used to calculate the predictions and confidence interval, and some other arguments that are passed to that.

    1. In our proportion bar chart, we need to set group = 1. Why? In other words what is the problem with these two graphs?
ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, y = ..prop..))

ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, fill = color, y = ..prop..))

Answer: If group is not set to 1, then all the bars have prop == 1. The function geom_bar assumes that the groups are equal to the x values, since the stat computes the counts within the group.

ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, fill = color, y = ..prop.., group=1))

ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, fill = color, y = ..prop.., group=1))

3.8 Position Adjustments

You can colour a bar chart using either the colour aesthetic, or, more usefully, fill

ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, colour = cut))

ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, fill = cut))

If you map the fill aesthetic to another variable, like clarity: the bars are automatically stacked. Each colored rectangle represents a combination of cut and clarity.

ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, fill = clarity))

Stacking is performed automatically by the position adjustment specified by the position argument. If you don’t want a stacked bar chart, you can use one of three other options: “identity”, “dodge” or “fill”

position = “identity” Places each object exactly where it falls in the context of the graph. This is not very useful for bars, because it overlaps them. To see that overlapping we either need to make the bars slightly transparent by setting alpha to a small value, or completely transparent by setting fill = NA.

The identity position adjustment is more useful for 2d geoms, like points, where it is the default

ggplot(data = diamonds, mapping = aes(x = cut, fill = clarity)) + 
  geom_bar(alpha = 1/5, position = "identity")

ggplot(data = diamonds, mapping = aes(x = cut, colour = clarity)) + 
  geom_bar(fill = NA, position = "identity")

position = “fill” Works like stacking, but makes each set of stacked bars the same height. This makes it easier to compare proportions across groups.

ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, fill = clarity), position = "fill")

position = “dodge” Places overlapping objects directly beside one another. This makes it easier to compare individual values.

ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, fill = clarity), position = "dodge")

** Another adjustment to note: Position = “jitter”** * Not useful for bar chart but for scatter plot. Position = “jitter” adds a small amount of random noise to each point. This spreads the points out because no two points are likely to receive the same amount of random noise.

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy))

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy), position = "jitter")

Adding randomness seems like a strange way to improve your plot, but while it makes your graph less accurate at small scales, it makes your graph more revealing at large scales. Because this is such a useful operation, ggplot2 comes with a shorthand for geom_point(position = “jitter”): geom_jitter().

To learn more about a position adjustment, look up the help page associated with each adjustment: ?position_dodge, ?position_fill, ?position_identity, ?position_jitter, and ?position_stack.

3.8.1 Exercises

  1. What is the problem with this plot? How could you improve it?
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) + 
  geom_point()

Answer: A lot of points aren’t shown here because they overlap. Using geom_jitter() allows you to see them all.

ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) + 
  geom_jitter()

  1. What parameters to geom_jitter() control the amount of jittering?

Answer: The amount of jitter is controlled by the width argument-increases the distance (noise) between the points.

ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) + 
  geom_jitter(width=1)

ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) + 
  geom_jitter(width=5)

ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) + 
  geom_jitter(width=10)

  1. Compare and contrast geom_jitter() with geom_count().

Answer

geom_count() increases the size of the points when there are more overlapping points. Similar to estimating the density of points in that location, While geom_jitter() just makes all the points visible and the same size.

ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) + 
  geom_count()

ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) + 
  geom_jitter()

  1. What’s the default position adjustment for geom_boxplot()? Create a visualisation of the mpg dataset that demonstrates it.

Answer The deault is for the boxplots to be non overlapping or dodged.

ggplot(data = mpg, mapping = aes(x = drv, y = hwy, color = class)) +
  geom_boxplot(position="dodge")

We can have them overlapping by using identity.

ggplot(data = mpg, mapping = aes(x = drv, y = hwy, color = class)) +
  geom_boxplot(position="identity")

3.9 Coordinate Systems

  • Most likely One of the complicated part og ggplot2
  • Default coordinate system is the Catesian coordinate system. Other coordinate systmes is as below:

  • coord_flip() switches the x and y axes. This is useful (for example), if you want horizontal boxplots. It’s also useful for long labels: it’s hard to get them to fit without overlapping on the x-axis

ggplot(data = mpg, mapping = aes(x = class, y = hwy)) + 
  geom_boxplot()

ggplot(data = mpg, mapping = aes(x = class, y = hwy)) + 
  geom_boxplot() +
  coord_flip()

  • coord_quickmap() sets the aspect ratio correctly for maps. This is very important if you’re plotting spatial data with ggplot2 (not covered in this book)
nz <- map_data("nz")

ggplot(nz, aes(long, lat, group = group)) +
  geom_polygon(fill = "white", colour = "black")

nz <- map_data("nz")

ggplot(nz, aes(long, lat, group = group)) +
  geom_polygon(fill = "white", colour = "black") +
  coord_quickmap()

  • coord_polar() uses polar coordinates. Polar coordinates reveal an interesting connection between a bar chart and a Coxcomb chart.
bar <- ggplot(data = diamonds) + 
  geom_bar(
    mapping = aes(x = cut, fill = cut), 
    show.legend = FALSE,
    width = 1
  ) + 
  theme(aspect.ratio = 1) +
  labs(x = NULL, y = NULL)

bar + coord_flip()

bar + coord_polar()

3.9.1 Exercises

  1. Turn a stacked bar chart into a pie chart using coord_polar().

Stacked Barchart

ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, fill = clarity), width = 1)

Coord_Polar()

ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, fill = clarity), width = 1) +
  coord_polar()

  1. What does labs() do? Read the documentation.

Answer: It labels the coordinates. I.e It modifies axis, legen and plit tables

?labs
  1. What’s the difference between coord_quickmap() and coord_map()?

Answer: Looking back at the New zealand spatial data….

nz <- map_data("nz")

ggplot(nz, aes(long, lat, group = group)) +
  geom_polygon(fill = "white", colour = "black") +
  coord_quickmap()

ggplot(nz, aes(long, lat, group = group)) +
  geom_polygon(fill = "white", colour = "black") +
  coord_map() 

  • Initial Observation: coord_map() eliminates some grid lines and shrinks the map a tiny bit.

  • coord_map() uses a 2D projection: by default the Mercatur project of the sphere to the plot. But this requires transforming all geoms.
  • coord_quickmap uses a quick approximation by using the lat/long ratio as an approximation. This is “quick” because the shapes don’t need to be transformed.

  1. What does the plot below tell you about the relationship between city and highway mpg? Why is coord_fixed() important? What does geom_abline() do?
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
  geom_point() + 
  geom_abline() +
  coord_fixed()

Answer: * The abline shown with the scatter points between cty and hwy shows that, one gets higher highway mpg compared to city mpg, but they are positively correlated. * geom_abline() gives the x = y line. If the points were on that line, the highway and city mpg would be the same * coord_fixed() fixes the ratio between the physical representation of data units on the axes-the ratio represents the number of units on the y-axis equivalent to one unit on the x-axis. It also ensures that the abline is at a 45 degree angle, which makes it easy to compare the highway and city mileage against what it would be if they were exactly the same

3.10 The Layered Grammar of Graphics

In the previous sections in chapter 3, you learned much more than how to make scatterplots, bar charts, and boxplots. You learned a foundation that you can use to make any type of plot with ggplot2. To see this, let’s add position adjustments, stats, coordinate systems, and faceting to our code template:

ggplot(data = <DATA>) + 
  <GEOM_FUNCTION>(
     mapping = aes(<MAPPINGS>),
     stat = <STAT>, 
     position = <POSITION>
  ) +
  <COORDINATE_FUNCTION> +
  <FACET_FUNCTION>

Template has 7 parameters, the bracketed words that appear in the above template. * In practice, you rarely need to supply all seven parameters to make a graph because ggplot2 will provide useful defaults for everything except the data, the mappings, and the geom function.

The seven parameters in the template compose the grammar of graphics: a formal system for building plots. The grammar of graphics is based on the insight that you can uniquely describe any plot as a combination of:

  • a dataset
  • a geom
  • a set of mappings
  • a stat
  • a position adjustment
  • a coordinate system
  • faceting scheme

Overall proccess summarized: See R for Data Science Text for more information